Towards a Motivated Annotation Schema of Collocation Errors in Learner Corpora

نویسندگان

  • Margarita Alonso Ramos
  • Leo Wanner
  • Orsolya Vincze
  • Gerard Casamayor
  • Nancy Vázquez Veiga
  • Estela Mosqueira Suárez
  • Sabela Prieto González
چکیده

Collocations play a significant role in second language acquisition. In order to be able to offer efficient support to learners, an NLP-based CALL environment for learning collocations should be based on a representative collocation error annotated learner corpus. However, so far, no theoretically-motivated collocation error tag set is available. Existing learner corpora tag collocation errors simply as “lexical errors” – which is clearly insufficient given the wide range of different collocation errors that the learners make. In this paper, we present a fine-grained three-dimensional typology of collocation errors that has been derived in an empirical study from the learner corpus CEDEL2 compiled by a team at the Autonomous University of Madrid. The first dimension captures whether the error concerns the collocation as a whole or one of its elements; the second dimension captures the language-oriented error analysis, while the third exemplifies the interpretative error analysis. To facilitate a smooth annotation along this typology, we adapted Knowtator, a flexible off-the-shelf annotation tool implemented as a Protégé plugin.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting a learner corpus for the development of a CALL environment for learning Spanish collocations

This paper provides an insight into ongoing research focusing on the exploitation of data from learner corpus in order to enhance the performance of an automatic tool aimed at the correction of collocation errors of L2 Spanish speakers. The procedure adopted for collocation annotation is described together with the main difficulties involved in the annotation task, such as the problem of distin...

متن کامل

Correcting Semantic Collocation Errors with L1-induced Paraphrases

We present a novel approach for automatic collocation error correction in learner English which is based on paraphrases extracted from parallel corpora. Our key assumption is that collocation errors are often caused by semantic similarity in the first language (L1language) of the writer. An analysis of a large corpus of annotated learner English confirms this assumption. We evaluate our approac...

متن کامل

Automated Suggestions for Miscollocations

One of the most common and persistent error types in second language writing is collocation errors, such as learn knowledge instead of gain or acquire knowledge, or make damage rather than cause damage. In this work-inprogress report, we propose a probabilistic model for suggesting corrections to lexical collocation errors. The probabilistic model incorporates three features: word association s...

متن کامل

Developing an Annotation Scheme for ELL Spelling Errors

This paper describes an XML annotation scheme for English Language Learner (ELL) spelling errors in learner corpora which can be used to create standardized, annotated ELL error corpora for use by researchers who are developing spelling correction tools for ELLs. We also provide an error taxonomy (with examples of each error type) upon which the scheme was based.

متن کامل

Towards a Methodology for Entity Error Analysis in Annotated Corpora

We present a methodology for error analysis in entity annotation. To increase the accuracy in corpora, there is a need for an analysis method for detecting human annotation and schema errors. We use easiness statistics and information gain to gain insights into possible causes of error in the GENIA corpus of MEDLINE abstracts.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010